Combining LSI and Vector Space to Improve Retrieval Performance

نویسنده

  • April Kontostathis
چکیده

We describe an approach to information retrieval using a technique which combines two existing well-known information retrieval techniques: traditional vector space retrieval and Latent Semantic Indexing (LSI). The technique described in this paper assigns a query to document similiarity that is a weighted average of the similarity scores obtained by LSI and traditional vector space retrieval. Our approach improves retrieval performance by 12%, on average, over vector space retrieval on the seven collections studied. Four of the collections were chosen because their retrieval performance is higher when vector space retrieval is used instead of LSI, and our method improves retrieval performance by an average of 14% for these collections. Our work shows that LSI can be used to significantly improve retrieval performance on collections that previously did not benefit from LSI. We have also shown that a small, fixed LSI dimensionality reduction parameter (k=10) can be used to capture the ‘latent semantic’ information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Performance of Latent Semantic Indexing based Information Retrieval

Conventional vector-based Information Retrieval (IR) models: Vector Space Model (VSM) and Generalized Vector Space Model (GVSM) represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands on computing resources. To overcome these problems, Latent Semantic Indexing (LSI), a variant of VSM, projects the documents into a lower dimensiona...

متن کامل

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

The task of Text Classification (TC) is to automatically assign natural language texts with thematic categories from a predefined category set. And Latent Semantic Indexing (LSI) is a well known technique in Information Retrieval, especially in dealing with polysemy (one word can have different meanings) and synonymy (different words are used to describe the same concept), but it is not an opti...

متن کامل

EDLSI with PSVD Updating

This paper describes the results obtained from the merging of two techniques that provide improvements to search and retrieval using Latent Semantic Indexing (LSI): Essential Dimensions of LSI (EDLSI) and partial singular value decomposition (PSVD) updating. EDLSI utilizes an implementation of LSI that requires the use of only a few dimensions in the LSI space. The PSVD updating and folding-up ...

متن کامل

A Comparison of Two Corpus - Based Methods forTranslingual Information

In translingual information retrieval (TIR), ad hoc queries in any of a set of languages can be used to retrieve documents in any of a set of languages. Classical information-retrieval methods such as the vector-space model cannot be applied to TIR because they base similarity on the overlap of terms between queries and documents| this is typically zero in TIR. The generalized vector-space mode...

متن کامل

A Comparison of Two Corpus - Based Methods forTranslingual

In translingual information retrieval (TIR), ad hoc queries in any of a set of languages can be used to retrieve documents in any of a set of languages. Classical informationretrieval methods such as the vector-space model cannot be applied to TIR because they base similarity on the overlap of terms between queries and documents|this is typically zero in TIR. The generalized vector-space model ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005